On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems — Appendix
نویسندگان
چکیده
Proof. Kalai and Lehrer [2] studied a model which can be equivalently described as a single-state SBG (i.e. |S| = 1) with a pure type distribution and product posterior. They showed that, if the player’s assessment of future play is absolutely continuous with respect to the true probabilities of future play (i.e. any event that has true positive probability is assigned positive probability by the player), then (1) must hold. In our case, absolute continuity always holds by Assumption 5 and the fact that the prior probabilities Pj are positive, as well as the fact that the type distribution is pure (from which we can infer that the true types always have positive posterior probability). In this proof, we seek to extend the convergence result of Kalai and Lehrer (henceforth [2]) to multi-state SBGs with pure type distributions. Our strategy is to translate a SBG Γ into a modified SBG Γ̂ which is equivalent to Γ in the sense that the players behave identically, and which is
منابع مشابه
On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert c...
متن کاملUnifying Convergence and No-Regret in Multiagent Learning
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-...
متن کاملConvergence, Targeted Optimality, and Safety in Multiagent Learning
This paper introduces a novel multiagent learning algorithm, Convergence with Model Learning and Safety (or CMLeS in short), which achieves convergence, targeted optimality against memory-bounded adversaries, and safety, in arbitrary repeated games. The most novel aspect of CMLeS is the manner in which it guarantees (in a PAC sense) targeted optimality against memory-bounded adversaries, via ef...
متن کاملA Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem
Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...
متن کاملMultiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games
Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014